{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 第七次人口普查数据分析可视化项目介绍\n",
    "## 项目人：钟莉\n",
    "## 时间：2022/07/01\n",
    "## 数据源：\n",
    "> [国家统计局](http://www.stats.gov.cn/tjsj/tjgb/rkpcgb/)\n",
    "- 由于第七次人口普查数据并未完全公布，只发布了8个全国人口普查公报。有表格数据的只有4个公报。\n",
    "- 因此，我在以下4个公报中，抓取了6个表格数据，并导出csv文件到data文件夹中。\n",
    ">>1. [第七次全国人口普查公报（第三号）](http://www.stats.gov.cn/tjsj/tjgb/rkpcgb/qgrkpcgb/202106/t20210628_1818822.html)\n",
    ">>2. [第七次全国人口普查公报（第四号）](http://www.stats.gov.cn/tjsj/tjgb/rkpcgb/qgrkpcgb/202106/t20210628_1818823.html)\n",
    ">>3. [第七次全国人口普查公报（第五号）](http://www.stats.gov.cn/tjsj/tjgb/rkpcgb/qgrkpcgb/202106/t20210628_1818824.html)\n",
    ">>4. [第七次全国人口普查公报（第六号）](http://www.stats.gov.cn/tjsj/tjgb/rkpcgb/qgrkpcgb/202106/t20210628_1818825.html)\n",
    "\n",
    "## 数据分析：\n",
    "### 数据抓取：\n",
    "- 利用pandas库中的read_html方法快速抓取网页中的表格型数据\n",
    "- 使用DataFrame对象中的loc属性和iloc属性进行数据抽取、数据的增加、修改和删除、索引设置，以获得最有价值、整洁的数据    \n",
    "\n",
    "### 可视化\n",
    "- 使用pyecharts进行数据可视化，共制作了以下9个可视化图表    \n",
    "\n",
    ">1. 全国人口分布_地图\n",
    ">2. 中国十年增长人数_折线图\n",
    ">3. 各地区人口性别构成_地图\n",
    ">4. 全国人口年龄占比_饼状图\n",
    ">5. 各地区每10万人口中拥有的各类受教育程度人数_地图\n",
    ">6. 北上广三地大学学历人数比较图_柱状图\n",
    ">7. 老龄化程度对比图_柱状图、折线图\n",
    ">8. 新生一代对比图_柱状图、折线图\n",
    ">9. 各地区15岁及以上人口平均受教育年限对比图\n",
    "## 目标\n",
    "- 人口问题始终是我国面临的全局性、长期性、战略性问题，我们必须要准确了解当前人口变化的趋势性特征。\n",
    "- 加强人口发展的前瞻性、战略性研究，为推动高质量发展、有针对性地制定人口相关战略和政策、促进人口长期均衡发展提供强有力的统计信息支持。\n",
    "- 调整完善人口政策，推动人口结构优化，促进人口素质提升。\n",
    "- 同时，为未来行业发展提供新方向，比如：老龄化问题严重，未来应更加注重老年人健康医疗行业等。    \n",
    "\n",
    "## 注：本次项目共有两个ipynb文档，此为数据抓取.ipynb文档"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 基本模块\n",
    "from requests_html import HTMLSession\n",
    "import requests_html\n",
    "import pandas as pd\n",
    "import urllib.parse"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 各地区人口构成数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>地区</td>\n",
       "      <td>人口数</td>\n",
       "      <td>比重[6]</td>\n",
       "      <td>比重[6]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>地区</td>\n",
       "      <td>人口数</td>\n",
       "      <td>2020年</td>\n",
       "      <td>2010年</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>全　国[5]</td>\n",
       "      <td>1411778724</td>\n",
       "      <td>100.00</td>\n",
       "      <td>100.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>北　京</td>\n",
       "      <td>21893095</td>\n",
       "      <td>1.55</td>\n",
       "      <td>1.46</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>天　津</td>\n",
       "      <td>13866009</td>\n",
       "      <td>0.98</td>\n",
       "      <td>0.97</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>河　北</td>\n",
       "      <td>74610235</td>\n",
       "      <td>5.28</td>\n",
       "      <td>5.36</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>山　西</td>\n",
       "      <td>34915616</td>\n",
       "      <td>2.47</td>\n",
       "      <td>2.67</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>内蒙古</td>\n",
       "      <td>24049155</td>\n",
       "      <td>1.70</td>\n",
       "      <td>1.84</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>辽　宁</td>\n",
       "      <td>42591407</td>\n",
       "      <td>3.02</td>\n",
       "      <td>3.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>吉　林</td>\n",
       "      <td>24073453</td>\n",
       "      <td>1.71</td>\n",
       "      <td>2.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>黑龙江</td>\n",
       "      <td>31850088</td>\n",
       "      <td>2.26</td>\n",
       "      <td>2.86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>上　海</td>\n",
       "      <td>24870895</td>\n",
       "      <td>1.76</td>\n",
       "      <td>1.72</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>江　苏</td>\n",
       "      <td>84748016</td>\n",
       "      <td>6.00</td>\n",
       "      <td>5.87</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>浙　江</td>\n",
       "      <td>64567588</td>\n",
       "      <td>4.57</td>\n",
       "      <td>4.06</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>安　徽</td>\n",
       "      <td>61027171</td>\n",
       "      <td>4.32</td>\n",
       "      <td>4.44</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>福　建</td>\n",
       "      <td>41540086</td>\n",
       "      <td>2.94</td>\n",
       "      <td>2.75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>江　西</td>\n",
       "      <td>45188635</td>\n",
       "      <td>3.20</td>\n",
       "      <td>3.33</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>山　东</td>\n",
       "      <td>101527453</td>\n",
       "      <td>7.19</td>\n",
       "      <td>7.15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>河　南</td>\n",
       "      <td>99365519</td>\n",
       "      <td>7.04</td>\n",
       "      <td>7.02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>湖　北</td>\n",
       "      <td>57752557</td>\n",
       "      <td>4.09</td>\n",
       "      <td>4.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>湖　南</td>\n",
       "      <td>66444864</td>\n",
       "      <td>4.71</td>\n",
       "      <td>4.90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>广　东</td>\n",
       "      <td>126012510</td>\n",
       "      <td>8.93</td>\n",
       "      <td>7.79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>广　西</td>\n",
       "      <td>50126804</td>\n",
       "      <td>3.55</td>\n",
       "      <td>3.44</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>海　南</td>\n",
       "      <td>10081232</td>\n",
       "      <td>0.71</td>\n",
       "      <td>0.65</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>重　庆</td>\n",
       "      <td>32054159</td>\n",
       "      <td>2.27</td>\n",
       "      <td>2.15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>四　川</td>\n",
       "      <td>83674866</td>\n",
       "      <td>5.93</td>\n",
       "      <td>6.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>贵　州</td>\n",
       "      <td>38562148</td>\n",
       "      <td>2.73</td>\n",
       "      <td>2.59</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>云　南</td>\n",
       "      <td>47209277</td>\n",
       "      <td>3.34</td>\n",
       "      <td>3.43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>西　藏</td>\n",
       "      <td>3648100</td>\n",
       "      <td>0.26</td>\n",
       "      <td>0.22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>陕　西</td>\n",
       "      <td>39528999</td>\n",
       "      <td>2.80</td>\n",
       "      <td>2.79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>甘　肃</td>\n",
       "      <td>25019831</td>\n",
       "      <td>1.77</td>\n",
       "      <td>1.91</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>青　海</td>\n",
       "      <td>5923957</td>\n",
       "      <td>0.42</td>\n",
       "      <td>0.42</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>宁　夏</td>\n",
       "      <td>7202654</td>\n",
       "      <td>0.51</td>\n",
       "      <td>0.47</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>新　疆</td>\n",
       "      <td>25852345</td>\n",
       "      <td>1.83</td>\n",
       "      <td>1.63</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>现役军人</td>\n",
       "      <td>2000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         0           1       2       3\n",
       "0       地区         人口数   比重[6]   比重[6]\n",
       "1       地区         人口数   2020年   2010年\n",
       "2   全　国[5]  1411778724  100.00  100.00\n",
       "3      北　京    21893095    1.55    1.46\n",
       "4      天　津    13866009    0.98    0.97\n",
       "5      河　北    74610235    5.28    5.36\n",
       "6      山　西    34915616    2.47    2.67\n",
       "7      内蒙古    24049155    1.70    1.84\n",
       "8      辽　宁    42591407    3.02    3.27\n",
       "9      吉　林    24073453    1.71    2.05\n",
       "10     黑龙江    31850088    2.26    2.86\n",
       "11     上　海    24870895    1.76    1.72\n",
       "12     江　苏    84748016    6.00    5.87\n",
       "13     浙　江    64567588    4.57    4.06\n",
       "14     安　徽    61027171    4.32    4.44\n",
       "15     福　建    41540086    2.94    2.75\n",
       "16     江　西    45188635    3.20    3.33\n",
       "17     山　东   101527453    7.19    7.15\n",
       "18     河　南    99365519    7.04    7.02\n",
       "19     湖　北    57752557    4.09    4.27\n",
       "20     湖　南    66444864    4.71    4.90\n",
       "21     广　东   126012510    8.93    7.79\n",
       "22     广　西    50126804    3.55    3.44\n",
       "23     海　南    10081232    0.71    0.65\n",
       "24     重　庆    32054159    2.27    2.15\n",
       "25     四　川    83674866    5.93    6.00\n",
       "26     贵　州    38562148    2.73    2.59\n",
       "27     云　南    47209277    3.34    3.43\n",
       "28     西　藏     3648100    0.26    0.22\n",
       "29     陕　西    39528999    2.80    2.79\n",
       "30     甘　肃    25019831    1.77    1.91\n",
       "31     青　海     5923957    0.42    0.42\n",
       "32     宁　夏     7202654    0.51    0.47\n",
       "33     新　疆    25852345    1.83    1.63\n",
       "34    现役军人     2000000     NaN     NaN"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 各地区人口构成\n",
    "url='http://www.stats.gov.cn/tjsj/tjgb/rkpcgb/qgrkpcgb/202106/t20210628_1818822.html'\n",
    "df_人口= pd.read_html(url)[1]\n",
    "df_人口"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据的增加、修改和删除："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 修改列标题\n",
    "df_人口.columns = ['地区','人口数(人)','2020年比重(%)','2010年比重(%)']\n",
    "# df_人口"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 删除行数据\n",
    "df_人口.drop(index=[0,1],inplace=True)\n",
    "# df_人口"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 更新索引\n",
    "df_人口构成 =df_人口.dropna().reset_index(drop=True)\n",
    "# df_人口构成"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 发现地区空格，会导致无法形成地图\n",
    "# 删掉地区列\n",
    "df_人口构成.drop(['地区'],axis=1,inplace=True)\n",
    "# df_人口构成"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>人口数(人)</th>\n",
       "      <th>2020年比重(%)</th>\n",
       "      <th>2010年比重(%)</th>\n",
       "      <th>地区</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1411778724</td>\n",
       "      <td>100.00</td>\n",
       "      <td>100.00</td>\n",
       "      <td>全国</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>21893095</td>\n",
       "      <td>1.55</td>\n",
       "      <td>1.46</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>13866009</td>\n",
       "      <td>0.98</td>\n",
       "      <td>0.97</td>\n",
       "      <td>天津</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>74610235</td>\n",
       "      <td>5.28</td>\n",
       "      <td>5.36</td>\n",
       "      <td>河北</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>34915616</td>\n",
       "      <td>2.47</td>\n",
       "      <td>2.67</td>\n",
       "      <td>山西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>24049155</td>\n",
       "      <td>1.70</td>\n",
       "      <td>1.84</td>\n",
       "      <td>内蒙古</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>42591407</td>\n",
       "      <td>3.02</td>\n",
       "      <td>3.27</td>\n",
       "      <td>辽宁</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>24073453</td>\n",
       "      <td>1.71</td>\n",
       "      <td>2.05</td>\n",
       "      <td>吉林</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>31850088</td>\n",
       "      <td>2.26</td>\n",
       "      <td>2.86</td>\n",
       "      <td>黑龙江</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>24870895</td>\n",
       "      <td>1.76</td>\n",
       "      <td>1.72</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>84748016</td>\n",
       "      <td>6.00</td>\n",
       "      <td>5.87</td>\n",
       "      <td>江苏</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>64567588</td>\n",
       "      <td>4.57</td>\n",
       "      <td>4.06</td>\n",
       "      <td>浙江</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>61027171</td>\n",
       "      <td>4.32</td>\n",
       "      <td>4.44</td>\n",
       "      <td>安徽</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>41540086</td>\n",
       "      <td>2.94</td>\n",
       "      <td>2.75</td>\n",
       "      <td>福建</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>45188635</td>\n",
       "      <td>3.20</td>\n",
       "      <td>3.33</td>\n",
       "      <td>江西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>101527453</td>\n",
       "      <td>7.19</td>\n",
       "      <td>7.15</td>\n",
       "      <td>山东</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>99365519</td>\n",
       "      <td>7.04</td>\n",
       "      <td>7.02</td>\n",
       "      <td>河南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>57752557</td>\n",
       "      <td>4.09</td>\n",
       "      <td>4.27</td>\n",
       "      <td>湖北</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>66444864</td>\n",
       "      <td>4.71</td>\n",
       "      <td>4.90</td>\n",
       "      <td>湖南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>126012510</td>\n",
       "      <td>8.93</td>\n",
       "      <td>7.79</td>\n",
       "      <td>广东</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>50126804</td>\n",
       "      <td>3.55</td>\n",
       "      <td>3.44</td>\n",
       "      <td>广西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>10081232</td>\n",
       "      <td>0.71</td>\n",
       "      <td>0.65</td>\n",
       "      <td>海南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>32054159</td>\n",
       "      <td>2.27</td>\n",
       "      <td>2.15</td>\n",
       "      <td>重庆</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>83674866</td>\n",
       "      <td>5.93</td>\n",
       "      <td>6.00</td>\n",
       "      <td>四川</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>38562148</td>\n",
       "      <td>2.73</td>\n",
       "      <td>2.59</td>\n",
       "      <td>贵州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>47209277</td>\n",
       "      <td>3.34</td>\n",
       "      <td>3.43</td>\n",
       "      <td>云南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>3648100</td>\n",
       "      <td>0.26</td>\n",
       "      <td>0.22</td>\n",
       "      <td>西藏</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>39528999</td>\n",
       "      <td>2.80</td>\n",
       "      <td>2.79</td>\n",
       "      <td>陕西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>25019831</td>\n",
       "      <td>1.77</td>\n",
       "      <td>1.91</td>\n",
       "      <td>甘肃</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>5923957</td>\n",
       "      <td>0.42</td>\n",
       "      <td>0.42</td>\n",
       "      <td>青海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>7202654</td>\n",
       "      <td>0.51</td>\n",
       "      <td>0.47</td>\n",
       "      <td>宁夏</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>25852345</td>\n",
       "      <td>1.83</td>\n",
       "      <td>1.63</td>\n",
       "      <td>新疆</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        人口数(人) 2020年比重(%) 2010年比重(%)   地区\n",
       "0   1411778724     100.00     100.00   全国\n",
       "1     21893095       1.55       1.46   北京\n",
       "2     13866009       0.98       0.97   天津\n",
       "3     74610235       5.28       5.36   河北\n",
       "4     34915616       2.47       2.67   山西\n",
       "5     24049155       1.70       1.84  内蒙古\n",
       "6     42591407       3.02       3.27   辽宁\n",
       "7     24073453       1.71       2.05   吉林\n",
       "8     31850088       2.26       2.86  黑龙江\n",
       "9     24870895       1.76       1.72   上海\n",
       "10    84748016       6.00       5.87   江苏\n",
       "11    64567588       4.57       4.06   浙江\n",
       "12    61027171       4.32       4.44   安徽\n",
       "13    41540086       2.94       2.75   福建\n",
       "14    45188635       3.20       3.33   江西\n",
       "15   101527453       7.19       7.15   山东\n",
       "16    99365519       7.04       7.02   河南\n",
       "17    57752557       4.09       4.27   湖北\n",
       "18    66444864       4.71       4.90   湖南\n",
       "19   126012510       8.93       7.79   广东\n",
       "20    50126804       3.55       3.44   广西\n",
       "21    10081232       0.71       0.65   海南\n",
       "22    32054159       2.27       2.15   重庆\n",
       "23    83674866       5.93       6.00   四川\n",
       "24    38562148       2.73       2.59   贵州\n",
       "25    47209277       3.34       3.43   云南\n",
       "26     3648100       0.26       0.22   西藏\n",
       "27    39528999       2.80       2.79   陕西\n",
       "28    25019831       1.77       1.91   甘肃\n",
       "29     5923957       0.42       0.42   青海\n",
       "30     7202654       0.51       0.47   宁夏\n",
       "31    25852345       1.83       1.63   新疆"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 添加正确的地区\n",
    "df_人口构成.loc[:,'地区'] = ['全国','北京','天津','河北','山西','内蒙古','辽宁','吉林','黑龙江','上海','江苏','浙江','安徽','福建','江西','山东','河南','湖北','湖南','广东','广西','海南','重庆','四川','贵州','云南','西藏','陕西','甘肃','青海','宁夏','新疆']\n",
    "df_人口构成"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 导出数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_人口构成.to_csv('data/各地区人口构成数据.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 全国人口年龄构成表格数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>年龄</td>\n",
       "      <td>人口数</td>\n",
       "      <td>比重</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>年龄</td>\n",
       "      <td>人口数</td>\n",
       "      <td>比重</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>总　计</td>\n",
       "      <td>1411778724</td>\n",
       "      <td>100.00</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0—14岁</td>\n",
       "      <td>253383938</td>\n",
       "      <td>17.95</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>15—59岁</td>\n",
       "      <td>894376020</td>\n",
       "      <td>63.35</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>60岁及以上</td>\n",
       "      <td>264018766</td>\n",
       "      <td>18.70</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>其中：65岁及以上</td>\n",
       "      <td>190635280</td>\n",
       "      <td>13.50</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           0           1       2   3\n",
       "0         年龄         人口数      比重 NaN\n",
       "1         年龄         人口数      比重 NaN\n",
       "2        总　计  1411778724  100.00 NaN\n",
       "3      0—14岁   253383938   17.95 NaN\n",
       "4     15—59岁   894376020   63.35 NaN\n",
       "5     60岁及以上   264018766   18.70 NaN\n",
       "6  其中：65岁及以上   190635280   13.50 NaN"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 全国人口年龄构成\n",
    "url='http://www.stats.gov.cn/tjsj/tjgb/rkpcgb/qgrkpcgb/202106/t20210628_1818824.html'\n",
    "df_全国人口年龄= pd.read_html(url)[1]\n",
    "df_全国人口年龄"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据的增加、修改和删除"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 删除第四列空的数据\n",
    "df_全国人口年龄.drop([3],axis=1,inplace=True)\n",
    "# df_全国人口年龄"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 修改列标题\n",
    "df_全国人口年龄.columns = ['年龄','人口数(人)','比重(%)']\n",
    "# df_全国人口年龄"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 删除行数据\n",
    "df_全国人口年龄.drop(index=[0,1],inplace=True)\n",
    "# df_全国人口年龄"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>年龄</th>\n",
       "      <th>人口数(人)</th>\n",
       "      <th>比重(%)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>总　计</td>\n",
       "      <td>1411778724</td>\n",
       "      <td>100.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0—14岁</td>\n",
       "      <td>253383938</td>\n",
       "      <td>17.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>15—59岁</td>\n",
       "      <td>894376020</td>\n",
       "      <td>63.35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>60岁及以上</td>\n",
       "      <td>264018766</td>\n",
       "      <td>18.70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>其中：65岁及以上</td>\n",
       "      <td>190635280</td>\n",
       "      <td>13.50</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          年龄      人口数(人)   比重(%)\n",
       "0        总　计  1411778724  100.00\n",
       "1      0—14岁   253383938   17.95\n",
       "2     15—59岁   894376020   63.35\n",
       "3     60岁及以上   264018766   18.70\n",
       "4  其中：65岁及以上   190635280   13.50"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 更新索引\n",
    "df_全国人口年龄构成 =df_全国人口年龄.dropna().reset_index(drop=True)\n",
    "df_全国人口年龄构成"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 导出数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_全国人口年龄构成.to_csv('data/全国人口年龄构成数据.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 各地区人口年龄构成数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>地区</td>\n",
       "      <td>比重</td>\n",
       "      <td>比重</td>\n",
       "      <td>比重</td>\n",
       "      <td>比重</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>地区</td>\n",
       "      <td>0—14岁</td>\n",
       "      <td>15—59岁</td>\n",
       "      <td>60岁及以上</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>地区</td>\n",
       "      <td>0—14岁</td>\n",
       "      <td>15—59岁</td>\n",
       "      <td>60岁及以上</td>\n",
       "      <td>其中：65岁及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>全　国</td>\n",
       "      <td>17.95</td>\n",
       "      <td>63.35</td>\n",
       "      <td>18.70</td>\n",
       "      <td>13.50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>北　京</td>\n",
       "      <td>11.84</td>\n",
       "      <td>68.53</td>\n",
       "      <td>19.63</td>\n",
       "      <td>13.30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>天　津</td>\n",
       "      <td>13.47</td>\n",
       "      <td>64.87</td>\n",
       "      <td>21.66</td>\n",
       "      <td>14.75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>河　北</td>\n",
       "      <td>20.22</td>\n",
       "      <td>59.92</td>\n",
       "      <td>19.85</td>\n",
       "      <td>13.92</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>山　西</td>\n",
       "      <td>16.35</td>\n",
       "      <td>64.72</td>\n",
       "      <td>18.92</td>\n",
       "      <td>12.90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>内蒙古</td>\n",
       "      <td>14.04</td>\n",
       "      <td>66.17</td>\n",
       "      <td>19.78</td>\n",
       "      <td>13.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>辽　宁</td>\n",
       "      <td>11.12</td>\n",
       "      <td>63.16</td>\n",
       "      <td>25.72</td>\n",
       "      <td>17.42</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>吉　林</td>\n",
       "      <td>11.71</td>\n",
       "      <td>65.23</td>\n",
       "      <td>23.06</td>\n",
       "      <td>15.61</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>黑龙江</td>\n",
       "      <td>10.32</td>\n",
       "      <td>66.46</td>\n",
       "      <td>23.22</td>\n",
       "      <td>15.61</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>上　海</td>\n",
       "      <td>9.80</td>\n",
       "      <td>66.82</td>\n",
       "      <td>23.38</td>\n",
       "      <td>16.28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>江　苏</td>\n",
       "      <td>15.21</td>\n",
       "      <td>62.95</td>\n",
       "      <td>21.84</td>\n",
       "      <td>16.20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>浙　江</td>\n",
       "      <td>13.45</td>\n",
       "      <td>67.86</td>\n",
       "      <td>18.70</td>\n",
       "      <td>13.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>安　徽</td>\n",
       "      <td>19.24</td>\n",
       "      <td>61.96</td>\n",
       "      <td>18.79</td>\n",
       "      <td>15.01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>福　建</td>\n",
       "      <td>19.32</td>\n",
       "      <td>64.70</td>\n",
       "      <td>15.98</td>\n",
       "      <td>11.10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>江　西</td>\n",
       "      <td>21.96</td>\n",
       "      <td>61.17</td>\n",
       "      <td>16.87</td>\n",
       "      <td>11.89</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>山　东</td>\n",
       "      <td>18.78</td>\n",
       "      <td>60.32</td>\n",
       "      <td>20.90</td>\n",
       "      <td>15.13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>河　南</td>\n",
       "      <td>23.14</td>\n",
       "      <td>58.79</td>\n",
       "      <td>18.08</td>\n",
       "      <td>13.49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>湖　北</td>\n",
       "      <td>16.31</td>\n",
       "      <td>63.26</td>\n",
       "      <td>20.42</td>\n",
       "      <td>14.59</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>湖　南</td>\n",
       "      <td>19.52</td>\n",
       "      <td>60.60</td>\n",
       "      <td>19.88</td>\n",
       "      <td>14.81</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>广　东</td>\n",
       "      <td>18.85</td>\n",
       "      <td>68.80</td>\n",
       "      <td>12.35</td>\n",
       "      <td>8.58</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>广　西</td>\n",
       "      <td>23.63</td>\n",
       "      <td>59.69</td>\n",
       "      <td>16.69</td>\n",
       "      <td>12.20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>海　南</td>\n",
       "      <td>19.97</td>\n",
       "      <td>65.38</td>\n",
       "      <td>14.65</td>\n",
       "      <td>10.43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>重　庆</td>\n",
       "      <td>15.91</td>\n",
       "      <td>62.22</td>\n",
       "      <td>21.87</td>\n",
       "      <td>17.08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>四　川</td>\n",
       "      <td>16.10</td>\n",
       "      <td>62.19</td>\n",
       "      <td>21.71</td>\n",
       "      <td>16.93</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>贵　州</td>\n",
       "      <td>23.97</td>\n",
       "      <td>60.65</td>\n",
       "      <td>15.38</td>\n",
       "      <td>11.56</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>云　南</td>\n",
       "      <td>19.57</td>\n",
       "      <td>65.52</td>\n",
       "      <td>14.91</td>\n",
       "      <td>10.75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>西　藏</td>\n",
       "      <td>24.53</td>\n",
       "      <td>66.95</td>\n",
       "      <td>8.52</td>\n",
       "      <td>5.67</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>陕　西</td>\n",
       "      <td>17.33</td>\n",
       "      <td>63.46</td>\n",
       "      <td>19.20</td>\n",
       "      <td>13.32</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>甘　肃</td>\n",
       "      <td>19.40</td>\n",
       "      <td>63.57</td>\n",
       "      <td>17.03</td>\n",
       "      <td>12.58</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>青　海</td>\n",
       "      <td>20.81</td>\n",
       "      <td>67.04</td>\n",
       "      <td>12.14</td>\n",
       "      <td>8.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>宁　夏</td>\n",
       "      <td>20.38</td>\n",
       "      <td>66.09</td>\n",
       "      <td>13.52</td>\n",
       "      <td>9.62</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>新　疆</td>\n",
       "      <td>22.46</td>\n",
       "      <td>66.26</td>\n",
       "      <td>11.28</td>\n",
       "      <td>7.76</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      0      1       2       3          4\n",
       "0    地区     比重      比重      比重         比重\n",
       "1    地区  0—14岁  15—59岁  60岁及以上        NaN\n",
       "2    地区  0—14岁  15—59岁  60岁及以上  其中：65岁及以上\n",
       "3   全　国  17.95   63.35   18.70      13.50\n",
       "4   北　京  11.84   68.53   19.63      13.30\n",
       "5   天　津  13.47   64.87   21.66      14.75\n",
       "6   河　北  20.22   59.92   19.85      13.92\n",
       "7   山　西  16.35   64.72   18.92      12.90\n",
       "8   内蒙古  14.04   66.17   19.78      13.05\n",
       "9   辽　宁  11.12   63.16   25.72      17.42\n",
       "10  吉　林  11.71   65.23   23.06      15.61\n",
       "11  黑龙江  10.32   66.46   23.22      15.61\n",
       "12  上　海   9.80   66.82   23.38      16.28\n",
       "13  江　苏  15.21   62.95   21.84      16.20\n",
       "14  浙　江  13.45   67.86   18.70      13.27\n",
       "15  安　徽  19.24   61.96   18.79      15.01\n",
       "16  福　建  19.32   64.70   15.98      11.10\n",
       "17  江　西  21.96   61.17   16.87      11.89\n",
       "18  山　东  18.78   60.32   20.90      15.13\n",
       "19  河　南  23.14   58.79   18.08      13.49\n",
       "20  湖　北  16.31   63.26   20.42      14.59\n",
       "21  湖　南  19.52   60.60   19.88      14.81\n",
       "22  广　东  18.85   68.80   12.35       8.58\n",
       "23  广　西  23.63   59.69   16.69      12.20\n",
       "24  海　南  19.97   65.38   14.65      10.43\n",
       "25  重　庆  15.91   62.22   21.87      17.08\n",
       "26  四　川  16.10   62.19   21.71      16.93\n",
       "27  贵　州  23.97   60.65   15.38      11.56\n",
       "28  云　南  19.57   65.52   14.91      10.75\n",
       "29  西　藏  24.53   66.95    8.52       5.67\n",
       "30  陕　西  17.33   63.46   19.20      13.32\n",
       "31  甘　肃  19.40   63.57   17.03      12.58\n",
       "32  青　海  20.81   67.04   12.14       8.68\n",
       "33  宁　夏  20.38   66.09   13.52       9.62\n",
       "34  新　疆  22.46   66.26   11.28       7.76"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 各地区人口年龄构成\n",
    "url='http://www.stats.gov.cn/tjsj/tjgb/rkpcgb/qgrkpcgb/202106/t20210628_1818824.html'\n",
    "df_各地区人口年龄=pd.read_html(url)[2]\n",
    "df_各地区人口年龄"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据的增加、修改和删除"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 修改列标题\n",
    "df_各地区人口年龄.columns = ['地区','0-14岁比重(%)','15-59岁比重(%)','60岁及以上比重(%)','其中65岁及以上比重(%)']\n",
    "# df_各地区人口年龄"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 删除行数据\n",
    "df_各地区人口年龄.drop(index=[0,1,2],inplace=True)\n",
    "# df_各地区人口年龄"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# 更新索引\n",
    "df_各地区人口年龄构成 = df_各地区人口年龄.dropna().reset_index(drop=True)\n",
    "# df_各地区人口年龄构成"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 发现地区空格，会导致无法形成地图\n",
    "# 删掉地区列\n",
    "df_各地区人口年龄构成.drop(['地区'],axis=1,inplace=True)\n",
    "# df_人口构成"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0-14岁比重(%)</th>\n",
       "      <th>15-59岁比重(%)</th>\n",
       "      <th>60岁及以上比重(%)</th>\n",
       "      <th>其中65岁及以上比重(%)</th>\n",
       "      <th>地区</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>17.95</td>\n",
       "      <td>63.35</td>\n",
       "      <td>18.70</td>\n",
       "      <td>13.50</td>\n",
       "      <td>全国</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>11.84</td>\n",
       "      <td>68.53</td>\n",
       "      <td>19.63</td>\n",
       "      <td>13.30</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>13.47</td>\n",
       "      <td>64.87</td>\n",
       "      <td>21.66</td>\n",
       "      <td>14.75</td>\n",
       "      <td>天津</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>20.22</td>\n",
       "      <td>59.92</td>\n",
       "      <td>19.85</td>\n",
       "      <td>13.92</td>\n",
       "      <td>河北</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>16.35</td>\n",
       "      <td>64.72</td>\n",
       "      <td>18.92</td>\n",
       "      <td>12.90</td>\n",
       "      <td>山西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>14.04</td>\n",
       "      <td>66.17</td>\n",
       "      <td>19.78</td>\n",
       "      <td>13.05</td>\n",
       "      <td>内蒙古</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>11.12</td>\n",
       "      <td>63.16</td>\n",
       "      <td>25.72</td>\n",
       "      <td>17.42</td>\n",
       "      <td>辽宁</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>11.71</td>\n",
       "      <td>65.23</td>\n",
       "      <td>23.06</td>\n",
       "      <td>15.61</td>\n",
       "      <td>吉林</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>10.32</td>\n",
       "      <td>66.46</td>\n",
       "      <td>23.22</td>\n",
       "      <td>15.61</td>\n",
       "      <td>黑龙江</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>9.80</td>\n",
       "      <td>66.82</td>\n",
       "      <td>23.38</td>\n",
       "      <td>16.28</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>15.21</td>\n",
       "      <td>62.95</td>\n",
       "      <td>21.84</td>\n",
       "      <td>16.20</td>\n",
       "      <td>江苏</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>13.45</td>\n",
       "      <td>67.86</td>\n",
       "      <td>18.70</td>\n",
       "      <td>13.27</td>\n",
       "      <td>浙江</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>19.24</td>\n",
       "      <td>61.96</td>\n",
       "      <td>18.79</td>\n",
       "      <td>15.01</td>\n",
       "      <td>安徽</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>19.32</td>\n",
       "      <td>64.70</td>\n",
       "      <td>15.98</td>\n",
       "      <td>11.10</td>\n",
       "      <td>福建</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>21.96</td>\n",
       "      <td>61.17</td>\n",
       "      <td>16.87</td>\n",
       "      <td>11.89</td>\n",
       "      <td>江西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>18.78</td>\n",
       "      <td>60.32</td>\n",
       "      <td>20.90</td>\n",
       "      <td>15.13</td>\n",
       "      <td>山东</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>23.14</td>\n",
       "      <td>58.79</td>\n",
       "      <td>18.08</td>\n",
       "      <td>13.49</td>\n",
       "      <td>河南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>16.31</td>\n",
       "      <td>63.26</td>\n",
       "      <td>20.42</td>\n",
       "      <td>14.59</td>\n",
       "      <td>湖北</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>19.52</td>\n",
       "      <td>60.60</td>\n",
       "      <td>19.88</td>\n",
       "      <td>14.81</td>\n",
       "      <td>湖南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>18.85</td>\n",
       "      <td>68.80</td>\n",
       "      <td>12.35</td>\n",
       "      <td>8.58</td>\n",
       "      <td>广东</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>23.63</td>\n",
       "      <td>59.69</td>\n",
       "      <td>16.69</td>\n",
       "      <td>12.20</td>\n",
       "      <td>广西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>19.97</td>\n",
       "      <td>65.38</td>\n",
       "      <td>14.65</td>\n",
       "      <td>10.43</td>\n",
       "      <td>海南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>15.91</td>\n",
       "      <td>62.22</td>\n",
       "      <td>21.87</td>\n",
       "      <td>17.08</td>\n",
       "      <td>重庆</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>16.10</td>\n",
       "      <td>62.19</td>\n",
       "      <td>21.71</td>\n",
       "      <td>16.93</td>\n",
       "      <td>四川</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>23.97</td>\n",
       "      <td>60.65</td>\n",
       "      <td>15.38</td>\n",
       "      <td>11.56</td>\n",
       "      <td>贵州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>19.57</td>\n",
       "      <td>65.52</td>\n",
       "      <td>14.91</td>\n",
       "      <td>10.75</td>\n",
       "      <td>云南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>24.53</td>\n",
       "      <td>66.95</td>\n",
       "      <td>8.52</td>\n",
       "      <td>5.67</td>\n",
       "      <td>西藏</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>17.33</td>\n",
       "      <td>63.46</td>\n",
       "      <td>19.20</td>\n",
       "      <td>13.32</td>\n",
       "      <td>陕西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>19.40</td>\n",
       "      <td>63.57</td>\n",
       "      <td>17.03</td>\n",
       "      <td>12.58</td>\n",
       "      <td>甘肃</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>20.81</td>\n",
       "      <td>67.04</td>\n",
       "      <td>12.14</td>\n",
       "      <td>8.68</td>\n",
       "      <td>青海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>20.38</td>\n",
       "      <td>66.09</td>\n",
       "      <td>13.52</td>\n",
       "      <td>9.62</td>\n",
       "      <td>宁夏</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>22.46</td>\n",
       "      <td>66.26</td>\n",
       "      <td>11.28</td>\n",
       "      <td>7.76</td>\n",
       "      <td>新疆</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   0-14岁比重(%) 15-59岁比重(%) 60岁及以上比重(%) 其中65岁及以上比重(%)   地区\n",
       "0       17.95       63.35       18.70         13.50   全国\n",
       "1       11.84       68.53       19.63         13.30   北京\n",
       "2       13.47       64.87       21.66         14.75   天津\n",
       "3       20.22       59.92       19.85         13.92   河北\n",
       "4       16.35       64.72       18.92         12.90   山西\n",
       "5       14.04       66.17       19.78         13.05  内蒙古\n",
       "6       11.12       63.16       25.72         17.42   辽宁\n",
       "7       11.71       65.23       23.06         15.61   吉林\n",
       "8       10.32       66.46       23.22         15.61  黑龙江\n",
       "9        9.80       66.82       23.38         16.28   上海\n",
       "10      15.21       62.95       21.84         16.20   江苏\n",
       "11      13.45       67.86       18.70         13.27   浙江\n",
       "12      19.24       61.96       18.79         15.01   安徽\n",
       "13      19.32       64.70       15.98         11.10   福建\n",
       "14      21.96       61.17       16.87         11.89   江西\n",
       "15      18.78       60.32       20.90         15.13   山东\n",
       "16      23.14       58.79       18.08         13.49   河南\n",
       "17      16.31       63.26       20.42         14.59   湖北\n",
       "18      19.52       60.60       19.88         14.81   湖南\n",
       "19      18.85       68.80       12.35          8.58   广东\n",
       "20      23.63       59.69       16.69         12.20   广西\n",
       "21      19.97       65.38       14.65         10.43   海南\n",
       "22      15.91       62.22       21.87         17.08   重庆\n",
       "23      16.10       62.19       21.71         16.93   四川\n",
       "24      23.97       60.65       15.38         11.56   贵州\n",
       "25      19.57       65.52       14.91         10.75   云南\n",
       "26      24.53       66.95        8.52          5.67   西藏\n",
       "27      17.33       63.46       19.20         13.32   陕西\n",
       "28      19.40       63.57       17.03         12.58   甘肃\n",
       "29      20.81       67.04       12.14          8.68   青海\n",
       "30      20.38       66.09       13.52          9.62   宁夏\n",
       "31      22.46       66.26       11.28          7.76   新疆"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 添加正确的地区\n",
    "df_各地区人口年龄构成.loc[:,'地区'] = ['全国','北京','天津','河北','山西','内蒙古','辽宁','吉林','黑龙江','上海','江苏','浙江','安徽','福建','江西','山东','河南','湖北','湖南','广东','广西','海南','重庆','四川','贵州','云南','西藏','陕西','甘肃','青海','宁夏','新疆']\n",
    "df_各地区人口年龄构成"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 导出数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_各地区人口年龄构成.to_csv('data/各地区人口年龄构成数据.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 各地区人口性别构成数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>地区</td>\n",
       "      <td>比重</td>\n",
       "      <td>比重</td>\n",
       "      <td>性别比</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>地区</td>\n",
       "      <td>男</td>\n",
       "      <td>女</td>\n",
       "      <td>性别比</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>全　国</td>\n",
       "      <td>51.24</td>\n",
       "      <td>48.76</td>\n",
       "      <td>105.07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>北　京</td>\n",
       "      <td>51.14</td>\n",
       "      <td>48.86</td>\n",
       "      <td>104.65</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>天　津</td>\n",
       "      <td>51.53</td>\n",
       "      <td>48.47</td>\n",
       "      <td>106.31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>河　北</td>\n",
       "      <td>50.50</td>\n",
       "      <td>49.50</td>\n",
       "      <td>102.02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>山　西</td>\n",
       "      <td>50.99</td>\n",
       "      <td>49.01</td>\n",
       "      <td>104.06</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>内蒙古</td>\n",
       "      <td>51.04</td>\n",
       "      <td>48.96</td>\n",
       "      <td>104.26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>辽　宁</td>\n",
       "      <td>49.92</td>\n",
       "      <td>50.08</td>\n",
       "      <td>99.70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>吉　林</td>\n",
       "      <td>49.92</td>\n",
       "      <td>50.08</td>\n",
       "      <td>99.69</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>黑龙江</td>\n",
       "      <td>50.09</td>\n",
       "      <td>49.91</td>\n",
       "      <td>100.35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>上　海</td>\n",
       "      <td>51.77</td>\n",
       "      <td>48.23</td>\n",
       "      <td>107.33</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>江　苏</td>\n",
       "      <td>50.78</td>\n",
       "      <td>49.22</td>\n",
       "      <td>103.15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>浙　江</td>\n",
       "      <td>52.16</td>\n",
       "      <td>47.84</td>\n",
       "      <td>109.04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>安　徽</td>\n",
       "      <td>50.97</td>\n",
       "      <td>49.03</td>\n",
       "      <td>103.94</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>福　建</td>\n",
       "      <td>51.68</td>\n",
       "      <td>48.32</td>\n",
       "      <td>106.94</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>江　西</td>\n",
       "      <td>51.60</td>\n",
       "      <td>48.40</td>\n",
       "      <td>106.62</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>山　东</td>\n",
       "      <td>50.66</td>\n",
       "      <td>49.34</td>\n",
       "      <td>102.67</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>河　南</td>\n",
       "      <td>50.15</td>\n",
       "      <td>49.85</td>\n",
       "      <td>100.60</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>湖　北</td>\n",
       "      <td>51.42</td>\n",
       "      <td>48.58</td>\n",
       "      <td>105.83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>湖　南</td>\n",
       "      <td>51.16</td>\n",
       "      <td>48.84</td>\n",
       "      <td>104.77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>广　东</td>\n",
       "      <td>53.07</td>\n",
       "      <td>46.93</td>\n",
       "      <td>113.08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>广　西</td>\n",
       "      <td>51.70</td>\n",
       "      <td>48.30</td>\n",
       "      <td>107.04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>海　南</td>\n",
       "      <td>53.02</td>\n",
       "      <td>46.98</td>\n",
       "      <td>112.86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>重　庆</td>\n",
       "      <td>50.55</td>\n",
       "      <td>49.45</td>\n",
       "      <td>102.21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>四　川</td>\n",
       "      <td>50.54</td>\n",
       "      <td>49.46</td>\n",
       "      <td>102.19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>贵　州</td>\n",
       "      <td>51.10</td>\n",
       "      <td>48.90</td>\n",
       "      <td>104.50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>云　南</td>\n",
       "      <td>51.73</td>\n",
       "      <td>48.27</td>\n",
       "      <td>107.16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>西　藏</td>\n",
       "      <td>52.45</td>\n",
       "      <td>47.55</td>\n",
       "      <td>110.32</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>陕　西</td>\n",
       "      <td>51.17</td>\n",
       "      <td>48.83</td>\n",
       "      <td>104.79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>甘　肃</td>\n",
       "      <td>50.76</td>\n",
       "      <td>49.24</td>\n",
       "      <td>103.10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>青　海</td>\n",
       "      <td>51.21</td>\n",
       "      <td>48.79</td>\n",
       "      <td>104.97</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>宁　夏</td>\n",
       "      <td>50.94</td>\n",
       "      <td>49.06</td>\n",
       "      <td>103.83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>新　疆</td>\n",
       "      <td>51.66</td>\n",
       "      <td>48.34</td>\n",
       "      <td>106.85</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      0      1      2       3\n",
       "0    地区     比重     比重     性别比\n",
       "1    地区      男      女     性别比\n",
       "2   全　国  51.24  48.76  105.07\n",
       "3   北　京  51.14  48.86  104.65\n",
       "4   天　津  51.53  48.47  106.31\n",
       "5   河　北  50.50  49.50  102.02\n",
       "6   山　西  50.99  49.01  104.06\n",
       "7   内蒙古  51.04  48.96  104.26\n",
       "8   辽　宁  49.92  50.08   99.70\n",
       "9   吉　林  49.92  50.08   99.69\n",
       "10  黑龙江  50.09  49.91  100.35\n",
       "11  上　海  51.77  48.23  107.33\n",
       "12  江　苏  50.78  49.22  103.15\n",
       "13  浙　江  52.16  47.84  109.04\n",
       "14  安　徽  50.97  49.03  103.94\n",
       "15  福　建  51.68  48.32  106.94\n",
       "16  江　西  51.60  48.40  106.62\n",
       "17  山　东  50.66  49.34  102.67\n",
       "18  河　南  50.15  49.85  100.60\n",
       "19  湖　北  51.42  48.58  105.83\n",
       "20  湖　南  51.16  48.84  104.77\n",
       "21  广　东  53.07  46.93  113.08\n",
       "22  广　西  51.70  48.30  107.04\n",
       "23  海　南  53.02  46.98  112.86\n",
       "24  重　庆  50.55  49.45  102.21\n",
       "25  四　川  50.54  49.46  102.19\n",
       "26  贵　州  51.10  48.90  104.50\n",
       "27  云　南  51.73  48.27  107.16\n",
       "28  西　藏  52.45  47.55  110.32\n",
       "29  陕　西  51.17  48.83  104.79\n",
       "30  甘　肃  50.76  49.24  103.10\n",
       "31  青　海  51.21  48.79  104.97\n",
       "32  宁　夏  50.94  49.06  103.83\n",
       "33  新　疆  51.66  48.34  106.85"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 各地区人口性别构成\n",
    "url='http://www.stats.gov.cn/tjsj/tjgb/rkpcgb/qgrkpcgb/202106/t20210628_1818823.html'\n",
    "df_人口性别=pd.read_html(url)[1]\n",
    "df_人口性别"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据的增加、修改和删除"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 修改列标题\n",
    "df_人口性别.columns = ['地区','男性比重(%)','女性比重(%)','性别比(%)']\n",
    "# df_人口性别"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 删除行数据\n",
    "df_人口性别.drop(index=[0,1],inplace=True)\n",
    "# df_人口性别"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# 更新索引\n",
    "df_人口性别构成 = df_人口性别.dropna().reset_index(drop=True)\n",
    "# df_人口性别构成"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 发现地区空格，会导致无法形成地图\n",
    "# 删掉地区列\n",
    "df_人口性别构成.drop(['地区'],axis=1,inplace=True)\n",
    "# df_人口性别构成"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>男性比重(%)</th>\n",
       "      <th>女性比重(%)</th>\n",
       "      <th>性别比(%)</th>\n",
       "      <th>地区</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>51.24</td>\n",
       "      <td>48.76</td>\n",
       "      <td>105.07</td>\n",
       "      <td>全国</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>51.14</td>\n",
       "      <td>48.86</td>\n",
       "      <td>104.65</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>51.53</td>\n",
       "      <td>48.47</td>\n",
       "      <td>106.31</td>\n",
       "      <td>天津</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>50.50</td>\n",
       "      <td>49.50</td>\n",
       "      <td>102.02</td>\n",
       "      <td>河北</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>50.99</td>\n",
       "      <td>49.01</td>\n",
       "      <td>104.06</td>\n",
       "      <td>山西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>51.04</td>\n",
       "      <td>48.96</td>\n",
       "      <td>104.26</td>\n",
       "      <td>内蒙古</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>49.92</td>\n",
       "      <td>50.08</td>\n",
       "      <td>99.70</td>\n",
       "      <td>辽宁</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>49.92</td>\n",
       "      <td>50.08</td>\n",
       "      <td>99.69</td>\n",
       "      <td>吉林</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>50.09</td>\n",
       "      <td>49.91</td>\n",
       "      <td>100.35</td>\n",
       "      <td>黑龙江</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>51.77</td>\n",
       "      <td>48.23</td>\n",
       "      <td>107.33</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>50.78</td>\n",
       "      <td>49.22</td>\n",
       "      <td>103.15</td>\n",
       "      <td>江苏</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>52.16</td>\n",
       "      <td>47.84</td>\n",
       "      <td>109.04</td>\n",
       "      <td>浙江</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>50.97</td>\n",
       "      <td>49.03</td>\n",
       "      <td>103.94</td>\n",
       "      <td>安徽</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>51.68</td>\n",
       "      <td>48.32</td>\n",
       "      <td>106.94</td>\n",
       "      <td>福建</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>51.60</td>\n",
       "      <td>48.40</td>\n",
       "      <td>106.62</td>\n",
       "      <td>江西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>50.66</td>\n",
       "      <td>49.34</td>\n",
       "      <td>102.67</td>\n",
       "      <td>山东</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>50.15</td>\n",
       "      <td>49.85</td>\n",
       "      <td>100.60</td>\n",
       "      <td>河南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>51.42</td>\n",
       "      <td>48.58</td>\n",
       "      <td>105.83</td>\n",
       "      <td>湖北</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>51.16</td>\n",
       "      <td>48.84</td>\n",
       "      <td>104.77</td>\n",
       "      <td>湖南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>53.07</td>\n",
       "      <td>46.93</td>\n",
       "      <td>113.08</td>\n",
       "      <td>广东</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>51.70</td>\n",
       "      <td>48.30</td>\n",
       "      <td>107.04</td>\n",
       "      <td>广西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>53.02</td>\n",
       "      <td>46.98</td>\n",
       "      <td>112.86</td>\n",
       "      <td>海南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>50.55</td>\n",
       "      <td>49.45</td>\n",
       "      <td>102.21</td>\n",
       "      <td>重庆</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>50.54</td>\n",
       "      <td>49.46</td>\n",
       "      <td>102.19</td>\n",
       "      <td>四川</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>51.10</td>\n",
       "      <td>48.90</td>\n",
       "      <td>104.50</td>\n",
       "      <td>贵州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>51.73</td>\n",
       "      <td>48.27</td>\n",
       "      <td>107.16</td>\n",
       "      <td>云南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>52.45</td>\n",
       "      <td>47.55</td>\n",
       "      <td>110.32</td>\n",
       "      <td>西藏</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>51.17</td>\n",
       "      <td>48.83</td>\n",
       "      <td>104.79</td>\n",
       "      <td>陕西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>50.76</td>\n",
       "      <td>49.24</td>\n",
       "      <td>103.10</td>\n",
       "      <td>甘肃</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>51.21</td>\n",
       "      <td>48.79</td>\n",
       "      <td>104.97</td>\n",
       "      <td>青海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>50.94</td>\n",
       "      <td>49.06</td>\n",
       "      <td>103.83</td>\n",
       "      <td>宁夏</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>51.66</td>\n",
       "      <td>48.34</td>\n",
       "      <td>106.85</td>\n",
       "      <td>新疆</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   男性比重(%) 女性比重(%)  性别比(%)   地区\n",
       "0    51.24   48.76  105.07   全国\n",
       "1    51.14   48.86  104.65   北京\n",
       "2    51.53   48.47  106.31   天津\n",
       "3    50.50   49.50  102.02   河北\n",
       "4    50.99   49.01  104.06   山西\n",
       "5    51.04   48.96  104.26  内蒙古\n",
       "6    49.92   50.08   99.70   辽宁\n",
       "7    49.92   50.08   99.69   吉林\n",
       "8    50.09   49.91  100.35  黑龙江\n",
       "9    51.77   48.23  107.33   上海\n",
       "10   50.78   49.22  103.15   江苏\n",
       "11   52.16   47.84  109.04   浙江\n",
       "12   50.97   49.03  103.94   安徽\n",
       "13   51.68   48.32  106.94   福建\n",
       "14   51.60   48.40  106.62   江西\n",
       "15   50.66   49.34  102.67   山东\n",
       "16   50.15   49.85  100.60   河南\n",
       "17   51.42   48.58  105.83   湖北\n",
       "18   51.16   48.84  104.77   湖南\n",
       "19   53.07   46.93  113.08   广东\n",
       "20   51.70   48.30  107.04   广西\n",
       "21   53.02   46.98  112.86   海南\n",
       "22   50.55   49.45  102.21   重庆\n",
       "23   50.54   49.46  102.19   四川\n",
       "24   51.10   48.90  104.50   贵州\n",
       "25   51.73   48.27  107.16   云南\n",
       "26   52.45   47.55  110.32   西藏\n",
       "27   51.17   48.83  104.79   陕西\n",
       "28   50.76   49.24  103.10   甘肃\n",
       "29   51.21   48.79  104.97   青海\n",
       "30   50.94   49.06  103.83   宁夏\n",
       "31   51.66   48.34  106.85   新疆"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 添加正确的地区\n",
    "df_人口性别构成.loc[:,'地区'] = ['全国','北京','天津','河北','山西','内蒙古','辽宁','吉林','黑龙江','上海','江苏','浙江','安徽','福建','江西','山东','河南','湖北','湖南','广东','广西','海南','重庆','四川','贵州','云南','西藏','陕西','甘肃','青海','宁夏','新疆']\n",
    "df_人口性别构成"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 导出数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_人口性别构成.to_csv('data/各地区人口性别构成数据.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 各地区每10万人口中拥有的各类受教育程度人数数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>地区</td>\n",
       "      <td>大学 （大专及以上）</td>\n",
       "      <td>高中 （含中专）</td>\n",
       "      <td>初中</td>\n",
       "      <td>小学</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>全　国</td>\n",
       "      <td>15467</td>\n",
       "      <td>15088</td>\n",
       "      <td>34507</td>\n",
       "      <td>24767</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>北　京</td>\n",
       "      <td>41980</td>\n",
       "      <td>17593</td>\n",
       "      <td>23289</td>\n",
       "      <td>10503</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>天　津</td>\n",
       "      <td>26940</td>\n",
       "      <td>17719</td>\n",
       "      <td>32294</td>\n",
       "      <td>16123</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>河　北</td>\n",
       "      <td>12418</td>\n",
       "      <td>13861</td>\n",
       "      <td>39950</td>\n",
       "      <td>24664</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>山　西</td>\n",
       "      <td>17358</td>\n",
       "      <td>16485</td>\n",
       "      <td>38950</td>\n",
       "      <td>19506</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>内蒙古</td>\n",
       "      <td>18688</td>\n",
       "      <td>14814</td>\n",
       "      <td>33861</td>\n",
       "      <td>23627</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>辽　宁</td>\n",
       "      <td>18216</td>\n",
       "      <td>14670</td>\n",
       "      <td>42799</td>\n",
       "      <td>18888</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>吉　林</td>\n",
       "      <td>16738</td>\n",
       "      <td>17080</td>\n",
       "      <td>38234</td>\n",
       "      <td>22318</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>黑龙江</td>\n",
       "      <td>14793</td>\n",
       "      <td>15525</td>\n",
       "      <td>42793</td>\n",
       "      <td>21863</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>上　海</td>\n",
       "      <td>33872</td>\n",
       "      <td>19020</td>\n",
       "      <td>28935</td>\n",
       "      <td>11929</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>江　苏</td>\n",
       "      <td>18663</td>\n",
       "      <td>16191</td>\n",
       "      <td>33308</td>\n",
       "      <td>22742</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>浙　江</td>\n",
       "      <td>16990</td>\n",
       "      <td>14555</td>\n",
       "      <td>32706</td>\n",
       "      <td>26384</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>安　徽</td>\n",
       "      <td>13280</td>\n",
       "      <td>13294</td>\n",
       "      <td>33724</td>\n",
       "      <td>26875</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>福　建</td>\n",
       "      <td>14148</td>\n",
       "      <td>14212</td>\n",
       "      <td>32218</td>\n",
       "      <td>28031</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>江　西</td>\n",
       "      <td>11897</td>\n",
       "      <td>15145</td>\n",
       "      <td>35501</td>\n",
       "      <td>27514</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>山　东</td>\n",
       "      <td>14384</td>\n",
       "      <td>14334</td>\n",
       "      <td>35778</td>\n",
       "      <td>23693</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>河　南</td>\n",
       "      <td>11744</td>\n",
       "      <td>15239</td>\n",
       "      <td>37518</td>\n",
       "      <td>24557</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>湖　北</td>\n",
       "      <td>15502</td>\n",
       "      <td>17428</td>\n",
       "      <td>34280</td>\n",
       "      <td>23520</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>湖　南</td>\n",
       "      <td>12239</td>\n",
       "      <td>17776</td>\n",
       "      <td>35636</td>\n",
       "      <td>25214</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>广　东</td>\n",
       "      <td>15699</td>\n",
       "      <td>18224</td>\n",
       "      <td>35484</td>\n",
       "      <td>20676</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>广　西</td>\n",
       "      <td>10806</td>\n",
       "      <td>12962</td>\n",
       "      <td>36388</td>\n",
       "      <td>27855</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>海　南</td>\n",
       "      <td>13919</td>\n",
       "      <td>15561</td>\n",
       "      <td>40174</td>\n",
       "      <td>19701</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>重　庆</td>\n",
       "      <td>15412</td>\n",
       "      <td>15956</td>\n",
       "      <td>30582</td>\n",
       "      <td>29894</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>四　川</td>\n",
       "      <td>13267</td>\n",
       "      <td>13301</td>\n",
       "      <td>31443</td>\n",
       "      <td>31317</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>贵　州</td>\n",
       "      <td>10952</td>\n",
       "      <td>9951</td>\n",
       "      <td>30464</td>\n",
       "      <td>31921</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>云　南</td>\n",
       "      <td>11601</td>\n",
       "      <td>10338</td>\n",
       "      <td>29241</td>\n",
       "      <td>35667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>西　藏</td>\n",
       "      <td>11019</td>\n",
       "      <td>7051</td>\n",
       "      <td>15757</td>\n",
       "      <td>32108</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>陕　西</td>\n",
       "      <td>18397</td>\n",
       "      <td>15581</td>\n",
       "      <td>33979</td>\n",
       "      <td>21686</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>甘　肃</td>\n",
       "      <td>14506</td>\n",
       "      <td>12937</td>\n",
       "      <td>27423</td>\n",
       "      <td>29808</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>青　海</td>\n",
       "      <td>14880</td>\n",
       "      <td>10568</td>\n",
       "      <td>24344</td>\n",
       "      <td>32725</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>宁　夏</td>\n",
       "      <td>17340</td>\n",
       "      <td>13432</td>\n",
       "      <td>29717</td>\n",
       "      <td>26111</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>新　疆</td>\n",
       "      <td>16536</td>\n",
       "      <td>13208</td>\n",
       "      <td>31559</td>\n",
       "      <td>28405</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      0           1         2      3      4\n",
       "0    地区  大学 （大专及以上）  高中 （含中专）     初中     小学\n",
       "1   全　国       15467     15088  34507  24767\n",
       "2   北　京       41980     17593  23289  10503\n",
       "3   天　津       26940     17719  32294  16123\n",
       "4   河　北       12418     13861  39950  24664\n",
       "5   山　西       17358     16485  38950  19506\n",
       "6   内蒙古       18688     14814  33861  23627\n",
       "7   辽　宁       18216     14670  42799  18888\n",
       "8   吉　林       16738     17080  38234  22318\n",
       "9   黑龙江       14793     15525  42793  21863\n",
       "10  上　海       33872     19020  28935  11929\n",
       "11  江　苏       18663     16191  33308  22742\n",
       "12  浙　江       16990     14555  32706  26384\n",
       "13  安　徽       13280     13294  33724  26875\n",
       "14  福　建       14148     14212  32218  28031\n",
       "15  江　西       11897     15145  35501  27514\n",
       "16  山　东       14384     14334  35778  23693\n",
       "17  河　南       11744     15239  37518  24557\n",
       "18  湖　北       15502     17428  34280  23520\n",
       "19  湖　南       12239     17776  35636  25214\n",
       "20  广　东       15699     18224  35484  20676\n",
       "21  广　西       10806     12962  36388  27855\n",
       "22  海　南       13919     15561  40174  19701\n",
       "23  重　庆       15412     15956  30582  29894\n",
       "24  四　川       13267     13301  31443  31317\n",
       "25  贵　州       10952      9951  30464  31921\n",
       "26  云　南       11601     10338  29241  35667\n",
       "27  西　藏       11019      7051  15757  32108\n",
       "28  陕　西       18397     15581  33979  21686\n",
       "29  甘　肃       14506     12937  27423  29808\n",
       "30  青　海       14880     10568  24344  32725\n",
       "31  宁　夏       17340     13432  29717  26111\n",
       "32  新　疆       16536     13208  31559  28405"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 各地区每10万人口中拥有的各类受教育程度人数\n",
    "url='http://www.stats.gov.cn/tjsj/tjgb/rkpcgb/qgrkpcgb/202106/t20210628_1818825.html'\n",
    "df_每10万人口受教育程度人数=pd.read_html(url)[1]\n",
    "df_每10万人口受教育程度人数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据的增加、修改和删除"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 修改列标题\n",
    "df_每10万人口受教育程度人数.columns = ['地区','大学（大专及以上）','高中（含中专）','初中','小学']\n",
    "# df_每10万人口受教育程度人数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 删除行数据\n",
    "df_每10万人口受教育程度人数.drop(index=[0],inplace=True)\n",
    "# df_每10万人口受教育程度人数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# 更新索引\n",
    "df_各地区每10万人口受教育程度人数 = df_每10万人口受教育程度人数.dropna().reset_index(drop=True)\n",
    "# df_各地区每10万人口受教育程度人数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 发现地区空格，会导致无法形成地图\n",
    "# 删掉地区列\n",
    "df_各地区每10万人口受教育程度人数.drop(['地区'],axis=1,inplace=True)\n",
    "# df_各地区每10万人口受教育程度人数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>大学（大专及以上）</th>\n",
       "      <th>高中（含中专）</th>\n",
       "      <th>初中</th>\n",
       "      <th>小学</th>\n",
       "      <th>地区</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>15467</td>\n",
       "      <td>15088</td>\n",
       "      <td>34507</td>\n",
       "      <td>24767</td>\n",
       "      <td>全国</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>41980</td>\n",
       "      <td>17593</td>\n",
       "      <td>23289</td>\n",
       "      <td>10503</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>26940</td>\n",
       "      <td>17719</td>\n",
       "      <td>32294</td>\n",
       "      <td>16123</td>\n",
       "      <td>天津</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>12418</td>\n",
       "      <td>13861</td>\n",
       "      <td>39950</td>\n",
       "      <td>24664</td>\n",
       "      <td>河北</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>17358</td>\n",
       "      <td>16485</td>\n",
       "      <td>38950</td>\n",
       "      <td>19506</td>\n",
       "      <td>山西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>18688</td>\n",
       "      <td>14814</td>\n",
       "      <td>33861</td>\n",
       "      <td>23627</td>\n",
       "      <td>内蒙古</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>18216</td>\n",
       "      <td>14670</td>\n",
       "      <td>42799</td>\n",
       "      <td>18888</td>\n",
       "      <td>辽宁</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>16738</td>\n",
       "      <td>17080</td>\n",
       "      <td>38234</td>\n",
       "      <td>22318</td>\n",
       "      <td>吉林</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>14793</td>\n",
       "      <td>15525</td>\n",
       "      <td>42793</td>\n",
       "      <td>21863</td>\n",
       "      <td>黑龙江</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>33872</td>\n",
       "      <td>19020</td>\n",
       "      <td>28935</td>\n",
       "      <td>11929</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>18663</td>\n",
       "      <td>16191</td>\n",
       "      <td>33308</td>\n",
       "      <td>22742</td>\n",
       "      <td>江苏</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>16990</td>\n",
       "      <td>14555</td>\n",
       "      <td>32706</td>\n",
       "      <td>26384</td>\n",
       "      <td>浙江</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>13280</td>\n",
       "      <td>13294</td>\n",
       "      <td>33724</td>\n",
       "      <td>26875</td>\n",
       "      <td>安徽</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>14148</td>\n",
       "      <td>14212</td>\n",
       "      <td>32218</td>\n",
       "      <td>28031</td>\n",
       "      <td>福建</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>11897</td>\n",
       "      <td>15145</td>\n",
       "      <td>35501</td>\n",
       "      <td>27514</td>\n",
       "      <td>江西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>14384</td>\n",
       "      <td>14334</td>\n",
       "      <td>35778</td>\n",
       "      <td>23693</td>\n",
       "      <td>山东</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>11744</td>\n",
       "      <td>15239</td>\n",
       "      <td>37518</td>\n",
       "      <td>24557</td>\n",
       "      <td>河南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>15502</td>\n",
       "      <td>17428</td>\n",
       "      <td>34280</td>\n",
       "      <td>23520</td>\n",
       "      <td>湖北</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>12239</td>\n",
       "      <td>17776</td>\n",
       "      <td>35636</td>\n",
       "      <td>25214</td>\n",
       "      <td>湖南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>15699</td>\n",
       "      <td>18224</td>\n",
       "      <td>35484</td>\n",
       "      <td>20676</td>\n",
       "      <td>广东</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>10806</td>\n",
       "      <td>12962</td>\n",
       "      <td>36388</td>\n",
       "      <td>27855</td>\n",
       "      <td>广西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>13919</td>\n",
       "      <td>15561</td>\n",
       "      <td>40174</td>\n",
       "      <td>19701</td>\n",
       "      <td>海南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>15412</td>\n",
       "      <td>15956</td>\n",
       "      <td>30582</td>\n",
       "      <td>29894</td>\n",
       "      <td>重庆</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>13267</td>\n",
       "      <td>13301</td>\n",
       "      <td>31443</td>\n",
       "      <td>31317</td>\n",
       "      <td>四川</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>10952</td>\n",
       "      <td>9951</td>\n",
       "      <td>30464</td>\n",
       "      <td>31921</td>\n",
       "      <td>贵州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>11601</td>\n",
       "      <td>10338</td>\n",
       "      <td>29241</td>\n",
       "      <td>35667</td>\n",
       "      <td>云南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>11019</td>\n",
       "      <td>7051</td>\n",
       "      <td>15757</td>\n",
       "      <td>32108</td>\n",
       "      <td>西藏</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>18397</td>\n",
       "      <td>15581</td>\n",
       "      <td>33979</td>\n",
       "      <td>21686</td>\n",
       "      <td>陕西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>14506</td>\n",
       "      <td>12937</td>\n",
       "      <td>27423</td>\n",
       "      <td>29808</td>\n",
       "      <td>甘肃</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>14880</td>\n",
       "      <td>10568</td>\n",
       "      <td>24344</td>\n",
       "      <td>32725</td>\n",
       "      <td>青海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>17340</td>\n",
       "      <td>13432</td>\n",
       "      <td>29717</td>\n",
       "      <td>26111</td>\n",
       "      <td>宁夏</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>16536</td>\n",
       "      <td>13208</td>\n",
       "      <td>31559</td>\n",
       "      <td>28405</td>\n",
       "      <td>新疆</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   大学（大专及以上） 高中（含中专）     初中     小学   地区\n",
       "0      15467   15088  34507  24767   全国\n",
       "1      41980   17593  23289  10503   北京\n",
       "2      26940   17719  32294  16123   天津\n",
       "3      12418   13861  39950  24664   河北\n",
       "4      17358   16485  38950  19506   山西\n",
       "5      18688   14814  33861  23627  内蒙古\n",
       "6      18216   14670  42799  18888   辽宁\n",
       "7      16738   17080  38234  22318   吉林\n",
       "8      14793   15525  42793  21863  黑龙江\n",
       "9      33872   19020  28935  11929   上海\n",
       "10     18663   16191  33308  22742   江苏\n",
       "11     16990   14555  32706  26384   浙江\n",
       "12     13280   13294  33724  26875   安徽\n",
       "13     14148   14212  32218  28031   福建\n",
       "14     11897   15145  35501  27514   江西\n",
       "15     14384   14334  35778  23693   山东\n",
       "16     11744   15239  37518  24557   河南\n",
       "17     15502   17428  34280  23520   湖北\n",
       "18     12239   17776  35636  25214   湖南\n",
       "19     15699   18224  35484  20676   广东\n",
       "20     10806   12962  36388  27855   广西\n",
       "21     13919   15561  40174  19701   海南\n",
       "22     15412   15956  30582  29894   重庆\n",
       "23     13267   13301  31443  31317   四川\n",
       "24     10952    9951  30464  31921   贵州\n",
       "25     11601   10338  29241  35667   云南\n",
       "26     11019    7051  15757  32108   西藏\n",
       "27     18397   15581  33979  21686   陕西\n",
       "28     14506   12937  27423  29808   甘肃\n",
       "29     14880   10568  24344  32725   青海\n",
       "30     17340   13432  29717  26111   宁夏\n",
       "31     16536   13208  31559  28405   新疆"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 添加正确的地区\n",
    "df_各地区每10万人口受教育程度人数.loc[:,'地区'] = ['全国','北京','天津','河北','山西','内蒙古','辽宁','吉林','黑龙江','上海','江苏','浙江','安徽','福建','江西','山东','河南','湖北','湖南','广东','广西','海南','重庆','四川','贵州','云南','西藏','陕西','甘肃','青海','宁夏','新疆']\n",
    "df_各地区每10万人口受教育程度人数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 导出数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_各地区每10万人口受教育程度人数.to_csv('data/各地区每10万人口中拥有的各类受教育程度人数数据.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 各地区15岁及以上人口平均受教育年限"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>地区</td>\n",
       "      <td>2020年</td>\n",
       "      <td>2010年</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>全　国</td>\n",
       "      <td>9.91</td>\n",
       "      <td>9.08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>北　京</td>\n",
       "      <td>12.64</td>\n",
       "      <td>11.71</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>天　津</td>\n",
       "      <td>11.29</td>\n",
       "      <td>10.38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>河　北</td>\n",
       "      <td>9.84</td>\n",
       "      <td>9.12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>山　西</td>\n",
       "      <td>10.45</td>\n",
       "      <td>9.52</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>内蒙古</td>\n",
       "      <td>10.08</td>\n",
       "      <td>9.22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>辽　宁</td>\n",
       "      <td>10.34</td>\n",
       "      <td>9.67</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>吉　林</td>\n",
       "      <td>10.17</td>\n",
       "      <td>9.49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>黑龙江</td>\n",
       "      <td>9.93</td>\n",
       "      <td>9.36</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>上　海</td>\n",
       "      <td>11.81</td>\n",
       "      <td>10.73</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>江　苏</td>\n",
       "      <td>10.21</td>\n",
       "      <td>9.32</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>浙　江</td>\n",
       "      <td>9.79</td>\n",
       "      <td>8.79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>安　徽</td>\n",
       "      <td>9.35</td>\n",
       "      <td>8.28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>福　建</td>\n",
       "      <td>9.66</td>\n",
       "      <td>9.02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>江　西</td>\n",
       "      <td>9.70</td>\n",
       "      <td>8.86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>山　东</td>\n",
       "      <td>9.75</td>\n",
       "      <td>8.97</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>河　南</td>\n",
       "      <td>9.79</td>\n",
       "      <td>8.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>湖　北</td>\n",
       "      <td>10.02</td>\n",
       "      <td>9.20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>湖　南</td>\n",
       "      <td>9.88</td>\n",
       "      <td>9.16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>广　东</td>\n",
       "      <td>10.38</td>\n",
       "      <td>9.55</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>广　西</td>\n",
       "      <td>9.54</td>\n",
       "      <td>8.76</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>海　南</td>\n",
       "      <td>10.10</td>\n",
       "      <td>9.22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>重　庆</td>\n",
       "      <td>9.80</td>\n",
       "      <td>8.75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>四　川</td>\n",
       "      <td>9.24</td>\n",
       "      <td>8.35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>贵　州</td>\n",
       "      <td>8.75</td>\n",
       "      <td>7.65</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>云　南</td>\n",
       "      <td>8.82</td>\n",
       "      <td>7.76</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>西　藏</td>\n",
       "      <td>6.75</td>\n",
       "      <td>5.25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>陕　西</td>\n",
       "      <td>10.26</td>\n",
       "      <td>9.36</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>甘　肃</td>\n",
       "      <td>9.13</td>\n",
       "      <td>8.19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>青　海</td>\n",
       "      <td>8.85</td>\n",
       "      <td>7.85</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>宁　夏</td>\n",
       "      <td>9.81</td>\n",
       "      <td>8.82</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>新　疆</td>\n",
       "      <td>10.11</td>\n",
       "      <td>9.27</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      0      1      2\n",
       "0    地区  2020年  2010年\n",
       "1   全　国   9.91   9.08\n",
       "2   北　京  12.64  11.71\n",
       "3   天　津  11.29  10.38\n",
       "4   河　北   9.84   9.12\n",
       "5   山　西  10.45   9.52\n",
       "6   内蒙古  10.08   9.22\n",
       "7   辽　宁  10.34   9.67\n",
       "8   吉　林  10.17   9.49\n",
       "9   黑龙江   9.93   9.36\n",
       "10  上　海  11.81  10.73\n",
       "11  江　苏  10.21   9.32\n",
       "12  浙　江   9.79   8.79\n",
       "13  安　徽   9.35   8.28\n",
       "14  福　建   9.66   9.02\n",
       "15  江　西   9.70   8.86\n",
       "16  山　东   9.75   8.97\n",
       "17  河　南   9.79   8.95\n",
       "18  湖　北  10.02   9.20\n",
       "19  湖　南   9.88   9.16\n",
       "20  广　东  10.38   9.55\n",
       "21  广　西   9.54   8.76\n",
       "22  海　南  10.10   9.22\n",
       "23  重　庆   9.80   8.75\n",
       "24  四　川   9.24   8.35\n",
       "25  贵　州   8.75   7.65\n",
       "26  云　南   8.82   7.76\n",
       "27  西　藏   6.75   5.25\n",
       "28  陕　西  10.26   9.36\n",
       "29  甘　肃   9.13   8.19\n",
       "30  青　海   8.85   7.85\n",
       "31  宁　夏   9.81   8.82\n",
       "32  新　疆  10.11   9.27"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 各地区15岁及以上人口平均受教育年限\n",
    "url='http://www.stats.gov.cn/tjsj/tjgb/rkpcgb/qgrkpcgb/202106/t20210628_1818825.html'\n",
    "df_15岁及以上人口平均受教育年限 = pd.read_html(url)[2]\n",
    "df_15岁及以上人口平均受教育年限"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据的增加、修改和删除"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 修改列标题\n",
    "df_15岁及以上人口平均受教育年限.columns = ['地区','2020年','2010年']\n",
    "# df_15岁及以上人口平均受教育年限"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 删除行数据\n",
    "df_15岁及以上人口平均受教育年限.drop(index=[0],inplace=True)\n",
    "# df_15岁及以上人口平均受教育年限"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# 更新索引\n",
    "df_各地区15岁及以上人口平均受教育年限 = df_15岁及以上人口平均受教育年限.dropna().reset_index(drop=True)\n",
    "# df_各地区15岁及以上人口平均受教育年限"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 发现地区空格，会导致无法形成地图\n",
    "# 删掉地区列\n",
    "df_各地区15岁及以上人口平均受教育年限.drop(['地区'],axis=1,inplace=True)\n",
    "# df_各地区15岁及以上人口平均受教育年限"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>2020年</th>\n",
       "      <th>2010年</th>\n",
       "      <th>地区</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9.91</td>\n",
       "      <td>9.08</td>\n",
       "      <td>全国</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>12.64</td>\n",
       "      <td>11.71</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11.29</td>\n",
       "      <td>10.38</td>\n",
       "      <td>天津</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9.84</td>\n",
       "      <td>9.12</td>\n",
       "      <td>河北</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>10.45</td>\n",
       "      <td>9.52</td>\n",
       "      <td>山西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>10.08</td>\n",
       "      <td>9.22</td>\n",
       "      <td>内蒙古</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>10.34</td>\n",
       "      <td>9.67</td>\n",
       "      <td>辽宁</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>10.17</td>\n",
       "      <td>9.49</td>\n",
       "      <td>吉林</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>9.93</td>\n",
       "      <td>9.36</td>\n",
       "      <td>黑龙江</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>11.81</td>\n",
       "      <td>10.73</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>10.21</td>\n",
       "      <td>9.32</td>\n",
       "      <td>江苏</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>9.79</td>\n",
       "      <td>8.79</td>\n",
       "      <td>浙江</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>9.35</td>\n",
       "      <td>8.28</td>\n",
       "      <td>安徽</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>9.66</td>\n",
       "      <td>9.02</td>\n",
       "      <td>福建</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>9.70</td>\n",
       "      <td>8.86</td>\n",
       "      <td>江西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>9.75</td>\n",
       "      <td>8.97</td>\n",
       "      <td>山东</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>9.79</td>\n",
       "      <td>8.95</td>\n",
       "      <td>河南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>10.02</td>\n",
       "      <td>9.20</td>\n",
       "      <td>湖北</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>9.88</td>\n",
       "      <td>9.16</td>\n",
       "      <td>湖南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>10.38</td>\n",
       "      <td>9.55</td>\n",
       "      <td>广东</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>9.54</td>\n",
       "      <td>8.76</td>\n",
       "      <td>广西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>10.10</td>\n",
       "      <td>9.22</td>\n",
       "      <td>海南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>9.80</td>\n",
       "      <td>8.75</td>\n",
       "      <td>重庆</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>9.24</td>\n",
       "      <td>8.35</td>\n",
       "      <td>四川</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>8.75</td>\n",
       "      <td>7.65</td>\n",
       "      <td>贵州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>8.82</td>\n",
       "      <td>7.76</td>\n",
       "      <td>云南</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>6.75</td>\n",
       "      <td>5.25</td>\n",
       "      <td>西藏</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>10.26</td>\n",
       "      <td>9.36</td>\n",
       "      <td>陕西</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>9.13</td>\n",
       "      <td>8.19</td>\n",
       "      <td>甘肃</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>8.85</td>\n",
       "      <td>7.85</td>\n",
       "      <td>青海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>9.81</td>\n",
       "      <td>8.82</td>\n",
       "      <td>宁夏</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>10.11</td>\n",
       "      <td>9.27</td>\n",
       "      <td>新疆</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    2020年  2010年   地区\n",
       "0    9.91   9.08   全国\n",
       "1   12.64  11.71   北京\n",
       "2   11.29  10.38   天津\n",
       "3    9.84   9.12   河北\n",
       "4   10.45   9.52   山西\n",
       "5   10.08   9.22  内蒙古\n",
       "6   10.34   9.67   辽宁\n",
       "7   10.17   9.49   吉林\n",
       "8    9.93   9.36  黑龙江\n",
       "9   11.81  10.73   上海\n",
       "10  10.21   9.32   江苏\n",
       "11   9.79   8.79   浙江\n",
       "12   9.35   8.28   安徽\n",
       "13   9.66   9.02   福建\n",
       "14   9.70   8.86   江西\n",
       "15   9.75   8.97   山东\n",
       "16   9.79   8.95   河南\n",
       "17  10.02   9.20   湖北\n",
       "18   9.88   9.16   湖南\n",
       "19  10.38   9.55   广东\n",
       "20   9.54   8.76   广西\n",
       "21  10.10   9.22   海南\n",
       "22   9.80   8.75   重庆\n",
       "23   9.24   8.35   四川\n",
       "24   8.75   7.65   贵州\n",
       "25   8.82   7.76   云南\n",
       "26   6.75   5.25   西藏\n",
       "27  10.26   9.36   陕西\n",
       "28   9.13   8.19   甘肃\n",
       "29   8.85   7.85   青海\n",
       "30   9.81   8.82   宁夏\n",
       "31  10.11   9.27   新疆"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 添加正确的地区\n",
    "df_各地区15岁及以上人口平均受教育年限.loc[:,'地区'] = ['全国','北京','天津','河北','山西','内蒙古','辽宁','吉林','黑龙江','上海','江苏','浙江','安徽','福建','江西','山东','河南','湖北','湖南','广东','广西','海南','重庆','四川','贵州','云南','西藏','陕西','甘肃','青海','宁夏','新疆']\n",
    "df_各地区15岁及以上人口平均受教育年限"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 导出数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_各地区15岁及以上人口平均受教育年限.to_csv('data/各地区15岁及以上人口平均受教育年限.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "164.977px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
